pRactice corner: Functions

lruolin

R4DS Practice 15: Functions

The codes below are from the practice exercises in https://r4ds.had.co.nz/, and are taken with reference from: https://jrnold.github.io/r4ds-exercise-solutions/

Let’s begin now

Loading tidyverse package.

library(tidyverse)
library(tibble)

Introduction

Why are functions important?

They help to automate common tasks, rather than copying and pasting, thus minimising human error.

When should you write a function?

When you need to copy and paste anything more than … twice..

# rnorm - random generation for normal distribution

df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df

# A tibble: 10 x 4
        a      b       c       d
    <dbl>  <dbl>   <dbl>   <dbl>
 1  2.33  -0.557  0.889  -1.45  
 2 -0.650  0.272 -1.92   -0.0814
 3 -0.647  0.299  1.10    0.243 
 4  0.586  0.373  0.175   0.845 
 5 -0.953 -1.04  -0.410  -1.86  
 6 -0.293 -1.52   0.888  -1.29  
 7  0.654  0.169 -0.766  -0.775 
 8  0.980 -0.423 -0.280   0.292 
 9 -1.96   0.162  0.0332  1.08  
10 -0.182 -0.454 -0.667   0.753

# Manual coding

df$a <- (df$a - min(df$a, na.rm = T)/
           (max(df$a, na.rm = T)) - min(df$a, na.rm = T))
df$b <- (df$b - min(df$b, na.rm = T)/
           (max(df$b, na.rm = T)) - min(df$b, na.rm = T))
df$c <- (df$c - min(df$c, na.rm = T)/
           (max(df$c, na.rm = T)) - min(df$c, na.rm = T))
df$d <- (df$d - min(df$d, na.rm = T)/
           (max(df$d, na.rm = T)) - min(df$d, na.rm = T))

# How to reduce copying, pasting, and manual replacing?

# Identify the number of inputs:

# - 1 variable: a numeric vector

x <- df$a
(x-min(x, na.rm = T)/(max(x, na.rm = T) - min(x, na.rm = T)))

 [1] 4.9317569 1.9547736 1.9578164 3.1905751 1.6515770 2.3117706
 [7] 3.2592569 3.5847952 0.6455328 2.4224042

range <- range(x, na.rm = T)
range # good practice to give names to intermediate calculations

[1] 0.8419688 5.1281929

# After trying out with a simple input, 
# Now you can turn it into a function:

# a. identify the name of the function

# b. list the inputs: function (input variable)

# c: place the code into the body of the function

rescale01 <- function(x) {
  range <- range(x, na.rm = T)
  (x - range[1])/(range[2] - range[1])
}

rescale01(c(0,5,10))

[1] 0.0 0.5 1.0

# What if there are Inf values?

x <- c(1:10, Inf)
x

 [1]   1   2   3   4   5   6   7   8   9  10 Inf

rescale01(x) # error: NaN

 [1]   0   0   0   0   0   0   0   0   0   0 NaN

# Let's fix the function
rescale01_inf <- function(x) {
  range <- range(x, na.rm = T, finite = T)
  (x - range[1])/(range[2] - range[1])
}

rescale01_inf(x)

 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000       Inf

# What if you want to map -Inf to 0, and Inf to 1?

  range <- range(x, na.rm = T, finite = T)
  y <- (x - range[1])/(range[2] - range[1])
  y[y ==-Inf] <- 0
  y[y ==Inf] <- 1
  y

 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000 1.0000000

# put into function
rescale01_inf_b <- function(x) {
  range <- range(x, na.rm = T, finite = T)
  y <- (x - range[1])/(range[2] - range[1])
  y[y==-Inf] <- 0
  y[y==Inf] <- 1
  y
}

rescale01_inf(x)

 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000       Inf

rescale01_inf_b(x)

 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000 1.0000000

Practice turning the following code snippets into functions

# to calculate the proportion of na values

x <- c(0, 1, 2, NA, 4, NA)

mean(is.na(x)) # number of NA as proportion

[1] 0.3333333

# write the function
prop_na <- function(x) {
  mean(is.na(x))
  
}

prop_na(x)

[1] 0.3333333

# to standardize the vector so that it sums to 1
x/sum(x, na.rm = T)

[1] 0.0000000 0.1428571 0.2857143        NA 0.5714286        NA

# write the function

sum_to_one <- function(x, na.rm = F){
  x/sum(x, na.rm = na.rm)
  
}

sum_to_one(1:5)

[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333

sum_to_one(c(1:5, NA))

[1] NA NA NA NA NA NA

sum_to_one(c(1:5, NA), na.rm = T)

[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333         NA

# to calculate the coefficient of variation

sd(x, na.rm = T)/mean(x, na.rm = T)

[1] 0.9759001

calc_coefficent_variation <- function(x, na.rm = F){
  sd(x, na.rm = na.rm)/ mean(x,na.rm = na.rm)
  
}

calc_coefficent_variation(1:5)

[1] 0.5270463

Compute the sample variance

variance <- function(x, na.rm = T){
  
  n <- length(x)
  m <- mean(x, na.rm = T)
  sq_err = (x - m)^2
  sum(sq_err)/n-1

}

var(1:10)

[1] 9.166667

Compute the skewness

skewness <- function(x, na.rm = F) {
  n <- length(x)
  m <- mean(x, na.rm = na.rm)
  v <- var(x, na.rm = na.rm)
  sum((x-m)^3 / (n-2)) / v^(3/2)
  
}

skewness(c(1,2,5,100))

[1] 1.494554

Write a function: both_na(), that takes two vectors of the same length and returns the number of positions that have an NA in both vectors.

x <- c(1:10, NA)
x

 [1]  1  2  3  4  5  6  7  8  9 10 NA

y <- c(1:10, NA)
y

 [1]  1  2  3  4  5  6  7  8  9 10 NA

sum(is.na(x) & is.na(y))

[1] 1

# write the function

both_na <- function(x, y) {
  sum(is.na(x) & is.na(y))
}

both_na(
  c(NA, 1,2,4),
  c(NA, NA, 1, 4)
)

[1] 1

Learning points

Functions aren’t as daunting as I thought. It can be simplified into a step-by-step manner. First, know what you want to automate from the function Identify the input variables Try out a code Write a function for the code and give it a proper name Even better, compile it into a package for your future use.

Reference

https://r4ds.had.co.nz/

https://jrnold.github.io/r4ds-exercise-solutions/

Comment on this article Share:

Functions